-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cache-Control to rustdoc pages #1569
Conversation
@syphar would you like to review this? I know you've had concerns about caching. |
@jsha thanks for the initiative! Would I think this (+ #845 and also adding this to From the top of my head:
I remember a discussion with pietro about caching where these things being always up-to-date was quite important. Of course we can discuss different goals. In #1552 this would have been solved by not caching any rustdoc pages in the browser but only in the CDN, which we actively invalidate on releases. |
From my research at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html, it looks like no. That page doesn't specifically mention stale-while-revalidate but I take that as meaning it's not respected. Note that I am setting max-age=10 minutes in addition to stale-while-revalidate. If that's a concern at CloudFront, we could set that to 0, or we could set s-maxage to 0 (which only affects shared caches).
Also for the sake of a very valuable feature (#845): being able to view previously visited pages offline. And if we ask the question "would we accept out of date UX if we are offline," I think the answer is definitely yes. The user knows the content will be out of date because they're offline, but keeping access to it is more valuable than losing access because a new crate version might have been uploaded while they were offline.
With #1562 that won't be an issue, since the "go to latest version" will go through a redirect that sends you to
I think that's okay for a few reasons. Most actions on docs.rs involve reading the docs, not opening the dropdown, so it would be rare for someone to see a stale list of versions in the dropdown. And docs.rs isn't the canonical up-to-date source for whether there are newer versions available, or which versions are yanked - crates.io is. The most important property of yankedness, of course, is what cargo does at the command line.
True, though if the user does any significant browsing they are likely to wind up visiting a page twice, which would result in getting the updated version, because it was revalidated after the first visit to that page. Also worth noting that the current system doesn't show the most up-to-date possible information. For instance, when a new release is in the build queue, that doesn't trigger the "Go to latest version" link on the previous version (because there would be nowhere to go), and the dropdown doesn't show the new version yet. Also, users may have a doc page loaded in their browser for arbitrarily long, during which time the "latest" info and the dropdown list of releases may become out of date. |
Thanks for the clarification! For me a 10 minute cache is totally fine, and typically even takes quite much load from the origin server.
@jsha please correct me if I'm wrong, but
That is what I assumed, thanks!
👍
👍 Regarding the design-question: Personally I'm more on the side that outdated docs for a short amount of time (like 30 minutes max) is totally fine. I remember @pietroalbini wanting it always up-to-date, and about @jyn514 I'm not sure. |
I think being outdated for a short time is fine; as long as |
Yes, for pages with stale-while-revalidate (in this PR, the explicitly-versioned pages) each request will be served from the existing local cache, while the browser updates the cache in the background. So for instance if you navigate away and back, or load the page a second time, you'll have a fresh copy. |
For the record, I manually tested if the header is actually used, In chrome I couldn't make it work, though I'm perhaps missing something. One thing that I just remembered: If it's only an issue for the CDN we could set |
Though it could also be fine for CSP when the nonce only changes every 10 minutes? I'm not sure |
Two questions: would caching+CSP block things that should work, and would it allow things that should be blocked? A little background: An example of XSS is when a site has some query parameter (or other input) that gets interpolated into the page without proper escaping. For example if CSP aims to stop XSS by saying "I already know all the scripts that will be on my page, so don't run any others." The most desirable CSP policy says "don't run any inline scripts (because that's usually how XSS manifests), and only run scripts that are hosted on my domain." Unfortunately that's not practical in a lot of cases. Many sites, like docs.rs, have inline scripts and aren't in a position to quickly get rid of them. How do you mark your inline scripts safe without marking the attacker's injected script as safe? You set a nonce in the CSP header and use the same nonce in your So what happens when the response is cached? The CSP header (with its nonce) will be cached with it. The cached page contains matching nonces on the appropriate scripts, so the appropriate scripts will continue to run on each page load. The "nonce" is now predictable to the attacker for ten minutes, which is a potential problem. They could start sending requests to So, overall my take is: I think it's fine. Also, we don't set CSP headers on rustdoc pages (though I'm guessing we'd eventually like to). Also, the nonces approach is not terrific; IMO it would be better to get rid of inline scripts and use hashes to identify the acceptable versions of main.js, etc. |
Any further thoughts on this? To shorten the above: I think there's no interaction between this and CSP because we don't currently set CSP on Rustdoc pages. And I don't think deploying this gets in the way of a future deployment of CSP on Rustdoc pages. |
I'm sorry for my recent lack of responses here. The only thing I don't feel know enough about to approve is probably the combined CSP&caching part. I'm not sure who in the team knows more? ( @jyn514 ? @Nemo157 ? ) |
Note that there isn't really a combined CSP&caching part here. The rustdoc pages this affects don't actually serve a CSP header. |
@jsha short update here: since we're changing caching-related things I prefer going a safe (and slowly increasing) approach here (I admit I'm definitely stigmatized by caching/CDN related issues over the last >20 years). This means two things:
|
I remember @pietroalbini set me up with access to this a while ago, I think https://forge.rust-lang.org/infra/docs/aws-access.html is the relevant page. |
Yeah, let's see if it's this, even better would be extended permissions for our access key, so we can build invalidation into the app. |
Thanks for the update @syphar! That's a nice sensible approach, and I definitely understand your caution around caching issues. I'll update the PR to make those things into config variables. |
@jsha I now have the access we need and we can continue on this PR, when you find the time. Since we didn't test this behind CloudFront I would still like to have both settings in config variables and slowly increase them in production. Now that I think about it, settings should be optional and the header should be omitted when the settings are missing. |
For /latest/ URLs, set max-age=0. For versioned URLs max-age=10 minutes and stale-while-revalidate=2 months. The idea behind this is that versioned URLs change mostly in minor ways - the "Go to latest" link at the top, and the list of versions in the crate menu. And setting a long cache time (either via max-age or via stale-while-revalidate) allows pages to be loaded even while offline. We could probably apply a long stale-while-revalidate to /latest/ URLs as well, but this is more likely to have a user-noticeable impact, and the /latest/ URLs are relatively new so we don't want to create any confusing interactions.
c55705b
to
4097c39
Compare
Sorry for the delay! Pushed an update with the config options. |
1627bf3
to
8b71c52
Compare
8b71c52
to
85ed1f4
Compare
I want to start pushing the caching topics again, @jsha is there anything new that needs to be added here? Otherwise I would continue test, merge & deploy. I'll also dig through other pieces where we could add caching following similar rules ( |
This should be good to go! Thanks for pushing it forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manual test is fine, we will slowly start and set & watch these settings and their effect.
For /latest/ URLs, set max-age=0. For versioned URLs max-age=10 minutes and stale-while-revalidate=2 months.
The idea behind this is that versioned URLs change mostly in minor ways - the "Go to latest" link at the top, and the list of versions in the crate menu. And setting a long cache time (either via max-age or via stale-while-revalidate) allows pages to be loaded even while offline.
We could probably apply a long stale-while-revalidate to /latest/ URLs as well, but this is more likely to have a user-noticeable impact, and the /latest/ URLs are relatively new so we don't want to create any confusing interactions.
Part of #845